Learning to predict single-wall carbon nanotube-recognition DNA sequences

被引:79
作者
Yang, Yoona [1 ]
Zheng, Ming [2 ]
Jagota, Anand [1 ,3 ]
机构
[1] Lehigh Univ, Dept Chem & Biomol Engn, Bethlehem, PA 18015 USA
[2] NIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
[3] Lehigh Univ, Dept Bioengn, Bethlehem, PA 18015 USA
关键词
STRANDED-DNA; BINDING; GRAPHITE; MOTIFS; ENERGY;
D O I
10.1038/s41524-018-0142-3
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
DNA/single-wall carbon nanotube (SWCNT) hybrids have enabled many applications because of their special ability to disperse and sort SWCNTs by their chirality and handedness. Much work has been done to discover sequences which recognize specific chiralities of SWCNT, and significant progress has been made in understanding the underlying structure and thermodynamics of these hybrids. Nevertheless, de novo prediction of recognition sequences remains essentially impossible and the success rate for their discovery by search of the vast single-stranded DNA library is very low. Here, we report an effective way of predicting recognition sequences based on machine learning analysis of existing experimental sequence data sets. Multiple input feature construction methods (position-specific, term-frequency, combined or segmented term frequency vector, and motif-based feature) were used and compared. The transformed features were used to train several classifier algorithms (logistic regression, support vector machine, and artificial neural network). Trained models were used to predict new sets of recognition sequences, and consensus among a number of models was used successfully to counteract the limited size of the data set. Predictions were tested using aqueous two-phase separation. New data thus acquired were used to retrain the models by adding an experimentally tested new set of predicted sequences to the original set. The frequency of finding correct recognition sequences by the trained model increased to >50% from the similar to 10% success rate in the original training data set.
引用
收藏
页数:7
相关论文
共 42 条
[1]  
Aiello S., 2018, Machine Learning with R and H2O
[2]   Differentiating Left- and Right-Handed Carbon Nanotubes by DNA [J].
Ao, Geyou ;
Streit, Jason K. ;
Fagan, Jeffrey A. ;
Zheng, Ming .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2016, 138 (51) :16677-16685
[3]  
Ao Geyou, 2015, Curr Protoc Chem Biol, V7, P43, DOI 10.1002/9780470559277.ch140099
[4]   DNA-Controlled Partition of Carbon Nanotubes in Polymer Aqueous Two-Phase Systems [J].
Ao, Geyou ;
Khripin, Constantine Y. ;
Zheng, Ming .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2014, 136 (29) :10383-10392
[5]   Sorting carbon nanotubes by electronic structure using density differentiation [J].
Arnold, Michael S. ;
Green, Alexander A. ;
Hulvat, James F. ;
Stupp, Samuel I. ;
Hersam, Mark C. .
NATURE NANOTECHNOLOGY, 2006, 1 (01) :60-65
[6]   Carbon nanotubes - the route toward applications [J].
Baughman, RH ;
Zakhidov, AA ;
de Heer, WA .
SCIENCE, 2002, 297 (5582) :787-792
[7]   Predicting gene function in Saccharomyces cerevisiae [J].
Clare, A. ;
King, R. D. .
BIOINFORMATICS, 2003, 19 :II42-II49
[8]   Base Motif Recognition and Design of DNA Templates for Fluorescent Silver Clusters by Machine Learning [J].
Copp, Stacy M. ;
Bogdanov, Petko ;
Debord, Mark ;
Singh, Ambuj ;
Gwinn, Elisabeth .
ADVANCED MATERIALS, 2014, 26 (33) :5839-5845
[9]  
COX DR, 1958, J R STAT SOC B, V20, P215
[10]   Carbon nanotubes: properties, synthesis, purification, and medical applications [J].
Eatemadi, Ali ;
Daraee, Hadis ;
Karimkhanloo, Hamzeh ;
Kouhi, Mohammad ;
Zarghami, Nosratollah ;
Akbarzadeh, Abolfazl ;
Abasi, Mozhgan ;
Hanifehpour, Younes ;
Joo, Sang Woo .
NANOSCALE RESEARCH LETTERS, 2014, 9 :1-13