Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

被引:19
作者
He, Wenying [1 ]
Ju, Ying [2 ]
Zeng, Xiangxiang [2 ]
Liu, Xiangrong [2 ]
Zou, Quan [1 ,3 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China
[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China
基金
中国国家自然科学基金;
关键词
non-coding DNA; DNA sequence; feature representation; genome synthesis; support vector machine; TERT PROMOTER MUTATIONS; PHYSICOCHEMICAL PROPERTIES; FEATURE-EXTRACTION; FEATURE-SELECTION; WEB SERVERS; PROTEIN; SITES; INFORMATION; RECURRENT; GENOME;
D O I
10.3389/fmicb.2018.02174
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.
引用
收藏
页数:9
相关论文
共 95 条
[81]   An accurate feature-based method for identifying DNA-binding residues on protein surfaces [J].
Xiong, Yi ;
Liu, Juan ;
Wei, Dong-Qing .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (02) :509-517
[82]   PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm [J].
Xu, Qian ;
Xiong, Yi ;
Dai, Hao ;
Kumari, Kotni Meena ;
Xu, Qin ;
Ou, Hong-Yu ;
Wei, Dong-Qing .
JOURNAL OF THEORETICAL BIOLOGY, 2017, 417 :1-7
[83]   Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition [J].
Yang, Huan ;
Tang, Hua ;
Chen, Xin-Xin ;
Zhang, Chang-Jian ;
Zhu, Pan-Pan ;
Ding, Hui ;
Chen, Wei ;
Lin, Hao .
BIOMED RESEARCH INTERNATIONAL, 2016, 2016
[84]   iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC [J].
Yang, Hui ;
Qiu, Wang-Ren ;
Liu, Guoqing ;
Guo, Feng-Biao ;
Chen, Wei ;
Chou, Kuo-Chen ;
Lin, Hao .
INTERNATIONAL JOURNAL OF BIOLOGICAL SCIENCES, 2018, 14 (08) :883-891
[85]   Recent Progress in Long Noncoding RNAs Prediction [J].
Yao, Yuhua ;
Li, Xianhong ;
Geng, Lili ;
Nan, Xuying ;
Qi, Zhaohui ;
Liao, Bo .
CURRENT BIOINFORMATICS, 2018, 13 (04) :344-351
[86]   PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction [J].
You, Zhu-Hong ;
Huang, Zhi-An ;
Zhu, Zexuan ;
Yan, Gui-Ying ;
Li, Zheng-Wei ;
Wen, Zhenkun ;
Chen, Xing .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (03)
[87]   Discriminating Ramos and Jurkat Cells with Image Textures from Diffraction Imaging Flow Cytometry Based on a Support Vector Machine [J].
Zhang, Ning ;
Sa, Yu ;
Guo, Yu ;
Lin, Wang ;
Wang, Ping ;
Feng, Yuanming .
CURRENT BIOINFORMATICS, 2018, 13 (01) :50-56
[88]   A global transcriptional network connecting noncoding mutations to changes in tumor gene expression [J].
Zhang, Wei ;
Bojorquez-Gomez, Ana ;
Velez, Daniel Ortiz ;
Xu, Guorong ;
Sanchez, Kyle S. ;
Shen, John Paul ;
Chen, Kevin ;
Licon, Katherine ;
Melton, Collin ;
Olson, Katrina M. ;
Yu, Michael Ku ;
Huang, Justin K. ;
Carter, Hannah ;
Farley, Emma K. ;
Snyder, Michael ;
Fraley, Stephanie I. ;
Kreisberg, Jason F. ;
Ideker, Trey .
NATURE GENETICS, 2018, 50 (04) :613-+
[89]   Engineering the ribosomal DNA in a megabase synthetic chromosome [J].
Zhang, Weimin ;
Zhao, Guanghou ;
Luo, Zhouqing ;
Lin, Yicong ;
Wang, Lihui ;
Guo, Yakun ;
Wang, Ann ;
Jiang, Shuangying ;
Jiang, Qingwen ;
Gong, Jianhui ;
Wang, Yun ;
Hou, Sha ;
Huang, Jing ;
Li, Tianyi ;
Qin, Yiran ;
Dong, Junkai ;
Qin, Qin ;
Zhang, Jiaying ;
Zou, Xinzhi ;
He, Xi ;
Zhao, Li ;
Xiao, Yibo ;
Xu, Meng ;
Cheng, Erchao ;
Huang, Ning ;
Zhou, Tong ;
Shen, Yue ;
Walker, Roy ;
Luo, Yisha ;
Kuang, Zheng ;
Mitchell, Leslie A. ;
Yang, Kun ;
Richardson, Sarah M. ;
Wu, Yi ;
Li, Bing-Zhi ;
Yuan, Ying-Jin ;
Yang, Huanming ;
Lin, Jiwei ;
Chen, Guo-Qiang ;
Wu, Qingyu ;
Bader, Joel S. ;
Cai, Yizhi ;
Boeke, Jef D. ;
Dai, Junbiao .
SCIENCE, 2017, 355 (6329)
[90]   Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods [J].
Zhang, Wen ;
Zhu, Xiaopeng ;
Fu, Yu ;
Tsuji, Junko ;
Weng, Zhiping .
BMC BIOINFORMATICS, 2017, 18