Classification of DNA Sequences: Performance Evaluation of Multiple Machine Learning Methods

被引:1
作者
Wang, Yiren [1 ]
Khandelwal, Vikram [2 ]
Das, Arindam K. [3 ]
Anantram, M. P. [4 ]
机构
[1] Univ Washington, Dept Elect & Comp Engn, Seattle, WA USA
[2] Interlake High Sch, Bellevue, WA USA
[3] Eastern Washington Univ, Dept Comp Sci & Elect Engn, Cheney, WA USA
[4] Univ Washington, Dept Elect & Comp Engn, Seattle, WA USA
来源
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY (NANO) | 2022年
关键词
D O I
10.1109/NANO54668.2022.9928773
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Polymerase chain reaction (PCR) has long been the mainstay in genetic sequencing and identification. Irrespective of whether short read or long read technologies are adopted, PCR methods are generally time consuming and expensive. Recently, an all-electronic approach, the so-called Single Molecule Break Junction (SMBJ) method, has been proposed as a possible alternative to PCR. In this article, we evaluate the performance of four different classifier models on the current signatures of ten short strand sequences, including a pair that differs by a single mismatch. We Lind that a gradient boosted tree classifier model achieves impressive accuracies, ranging from approximately 96% for molecules differing by a single mismatch to 99.5% otherwise.
引用
收藏
页码:333 / 336
页数:4
相关论文
共 6 条
[1]  
[Anonymous], XGBOOST 1 3 0 SNAPSH
[2]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[3]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[4]  
Hastie T., 2010, The Elements of Statistical Learning, V2nd, DOI DOI 10.1007/978-0-387-21606-514
[5]   Detection and identification of genetic material via single-molecule conductance [J].
Li, Yuanhui ;
Artes, Juan M. ;
Demir, Busra ;
Gokce, Sumeyye ;
Mohammad, Hashem M. ;
Alangari, Mashari ;
Anantram, M. P. ;
Oren, Ersin Emre ;
Hihath, Joshua .
NATURE NANOTECHNOLOGY, 2018, 13 (12) :1167-+
[6]   A machine learning approach for accurate and real-time DNA sequence identification [J].
Wang, Yiren ;
Alangari, Mashari ;
Hihath, Joshua ;
Das, Arindam K. ;
Anantram, M. P. .
BMC GENOMICS, 2021, 22 (01)