A Sequential Multidimensional Analysis Algorithm for Aptamer Identification based on Structure Analysis and Machine Learning

被引:55
作者
Song, Jia [1 ]
Zheng, Yuan [1 ]
Huang, Mengjiao [2 ,3 ]
Wu, Lingling [1 ]
Wang, Wei [1 ]
Zhu, Zhi [2 ,3 ]
Song, Yanling [1 ,2 ,3 ]
Yang, Chaoyong [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Med, Renji Hosp, Inst Mol Med, Shanghai 200127, Peoples R China
[2] Xiamen Univ, State Key Lab Phys Chem Solid Surface, Key Lab Chem Biol Fujian Prov, Key Lab Analyt Chem,Coll Chem & Chem Engn, Xiamen 361005, Peoples R China
[3] Xiamen Univ, Coll Chem & Chem Engn, Dept Chem Biol, Xiamen 361005, Peoples R China
基金
美国国家科学基金会; 国家重点研发计划;
关键词
DNA APTAMERS; MOLECULE; QUADRUPLEXES; DISCOVERY; EFFICIENT; SELECTION;
D O I
10.1021/acs.analchem.9b05203
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Molecular recognition ligands are of great significance in many fields, but our ability to develop new recognition molecules remains to be expanded. Here, we developed a Sequential Multidimensional Analysis algoRiThm for aptamer discovery (SMART-Aptamer) from high-throughput sequencing (HTS) data of SELEX libraries based on multilevel structure analysis and unsupervised machine learning to discover nucleic acid recognition ligands with high accuracy and efficiency. We validated SMART-Aptamer with three sets of HTS data from screening pools against hESCs, EpCAM, and CSV. High affinity aptamers for all three targets were successfully obtained, and the results revealed that SMART-Aptamer is able to pick out high affinity aptamers with low false positive and negative rates. With the advantages of accuracy, efficiency, and robustness, SMART-Aptamer represents a paradigm-shift strategy for the discovery of binding ligands for a variety of biomedical applications.
引用
收藏
页码:3307 / 3314
页数:8
相关论文
共 33 条
[1]   RNA-aptamers-in-droplets (RAPID) high-throughput screening for secretory phenotypes [J].
Abatemarco, Joseph ;
Sarhan, Maen F. ;
Wagner, James M. ;
Lin, Jyun-Liang ;
Liu, Leqian ;
Hassouneh, Wafa ;
Yuan, Shuo-Fu ;
Alper, Hal S. ;
Abate, Adam R. .
NATURE COMMUNICATIONS, 2017, 8
[2]   FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections [J].
Alam, Khalid K. ;
Chang, Jonathan L. ;
Burke, Donald H. .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2015, 4 :e230
[3]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[4]   Discovery of a Covalent Kinase Inhibitor from a DNA-Encoded Small Molecule Library x Protein Library Selection [J].
Chang, Alix I. ;
McGregor, Lynn M. ;
Jain, Tara ;
Liu, David R. .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2017, 139 (30) :10192-10195
[5]   The application of DNA and RNA G-quadruplexes to therapeutic medicines [J].
Collie, Gavin W. ;
Parkinson, Gary N. .
CHEMICAL SOCIETY REVIEWS, 2011, 40 (12) :5867-5892
[6]   OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy [J].
Emms, David M. ;
Kelly, Steven .
GENOME BIOLOGY, 2015, 16
[7]   An efficient algorithm for large-scale detection of protein families [J].
Enright, AJ ;
Van Dongen, S ;
Ouzounis, CA .
NUCLEIC ACIDS RESEARCH, 2002, 30 (07) :1575-1584
[8]   QGRS-Conserve: a computational method for discovering evolutionarily conserved G-quadruplex motifs [J].
Frees, Scott ;
Menendez, Camille ;
Crum, Matt ;
Bagga, Paramjeet S. .
HUMAN GENOMICS, 2014, 8
[9]   Using RNA secondary structures to guide sequence motif finding towards single-stranded regions [J].
Hiller, Michael ;
Pudimat, Rainer ;
Busch, Anke ;
Backofen, Rolf .
NUCLEIC ACIDS RESEARCH, 2006, 34 (17)
[10]  
Hoinka J, 2014, LECT N BIOINFORMAT, V8394, P115, DOI 10.1007/978-3-319-05269-4_9