Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

被引:18
作者
He, Wenying [1 ]
Ju, Ying [2 ]
Zeng, Xiangxiang [2 ]
Liu, Xiangrong [2 ]
Zou, Quan [1 ,3 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China
[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China
基金
中国国家自然科学基金;
关键词
non-coding DNA; DNA sequence; feature representation; genome synthesis; support vector machine; TERT PROMOTER MUTATIONS; PHYSICOCHEMICAL PROPERTIES; FEATURE-EXTRACTION; FEATURE-SELECTION; WEB SERVERS; PROTEIN; SITES; INFORMATION; RECURRENT; GENOME;
D O I
10.3389/fmicb.2018.02174
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.
引用
收藏
页数:9
相关论文
共 95 条
  • [1] BUILDING AN ASSOCIATIVE MEMORY VASTLY LARGER THAN THE BRAIN
    BAUM, EB
    [J]. SCIENCE, 1995, 268 (5210) : 583 - 585
  • [2] Bishop Christopher M, 2016, Pattern recognition and machine learning
  • [3] Byun H, 2002, LECT NOTES COMPUT SC, V2388, P213
  • [4] SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines
    Cao, Renzhi
    Wang, Zheng
    Wang, Yiheng
    Cheng, Jianlin
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [5] Genome engineering
    Carr, Peter A.
    Church, George M.
    [J]. NATURE BIOTECHNOLOGY, 2009, 27 (12) : 1151 - 1162
  • [6] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [7] IACP: a sequence-based tool for identifying anticancer peptides
    Chen, Wei
    Ding, Hui
    Feng, Pengmian
    Lin, Hao
    Chou, Kuo-Chen
    [J]. ONCOTARGET, 2016, 7 (13) : 16895 - 16909
  • [8] Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition
    Chen, Xin-Xin
    Tang, Hua
    Li, Wen-Chao
    Wu, Hao
    Chen, Wei
    Ding, Hui
    Lin, Hao
    [J]. BIOMED RESEARCH INTERNATIONAL, 2016, 2016
  • [9] MicroRNAs and complex diseases: from experimental results to computational models
    Chen, Xing
    Xie, Di
    Zhao, Qi
    You, Zhu-Hong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (02) : 515 - 539
  • [10] Long non-coding RNAs and complex diseases: from experimental results to computational models
    Chen, Xing
    Yan, Chenggang Clarence
    Zhang, Xu
    You, Zhu-Hong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2017, 18 (04) : 558 - 576