Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

被引:18
|
作者
He, Wenying [1 ]
Ju, Ying [2 ]
Zeng, Xiangxiang [2 ]
Liu, Xiangrong [2 ]
Zou, Quan [1 ,3 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China
[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China
基金
中国国家自然科学基金;
关键词
non-coding DNA; DNA sequence; feature representation; genome synthesis; support vector machine; TERT PROMOTER MUTATIONS; PHYSICOCHEMICAL PROPERTIES; FEATURE-EXTRACTION; FEATURE-SELECTION; WEB SERVERS; PROTEIN; SITES; INFORMATION; RECURRENT; GENOME;
D O I
10.3389/fmicb.2018.02174
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [22] CBNA: A control theory based method for identifying coding and non-coding cancer drivers
    Pham, Vu V. H.
    Liu, Lin
    Bracken, Cameron P.
    Goodall, Gregory J.
    Long, Qi
    Li, Jiuyong
    Le, Thuc D.
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (12)
  • [23] iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels
    Ding, Hui
    Deng, En-Ze
    Yuan, Lu-Feng
    Liu, Li
    Lin, Hao
    Chen, Wei
    Chou, Kuo-Chen
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [24] CPPred-RF: A Sequence-based Predictor for Identifying Cell Penetrating Peptides and Their Uptake Efficiency
    Wei, Leyi
    Xing, PengWei
    Su, Ran
    Shi, Gaotao
    Ma, Zhanshan Sam
    Zou, Quan
    JOURNAL OF PROTEOME RESEARCH, 2017, 16 (05) : 2044 - 2053
  • [25] Evidence for abundant transcription of non-coding regions in the Saccharomyces cerevisiae genome -: art. no. 93
    Havilio, M
    Levanon, EY
    Lerman, G
    Kupiec, M
    Eisenberg, E
    BMC GENOMICS, 2005, 6 (1)
  • [26] Functional characterisation of long intergenic non-coding RNAs through genetic interaction profiling in Saccharomyces cerevisiae
    Kyriakou, Dimitris
    Stavrou, Emmanouil
    Demosthenous, Panayiota
    Angelidou, Georgia
    Luis, Bryan-Joseph San
    Boone, Charles
    Promponas, Vasilis J.
    Kirmizis, Antonis
    BMC BIOLOGY, 2016, 14
  • [27] Functional characterisation of long intergenic non-coding RNAs through genetic interaction profiling in Saccharomyces cerevisiae
    Dimitris Kyriakou
    Emmanouil Stavrou
    Panayiota Demosthenous
    Georgia Angelidou
    Bryan-Joseph San Luis
    Charles Boone
    Vasilis J. Promponas
    Antonis Kirmizis
    BMC Biology, 14
  • [28] ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors
    Chu, Wen-Yi
    Huang, Yu-Feng
    Huang, Chun-Chin
    Cheng, Yi-Sheng
    Huang, Chien-Kang
    Oyang, Yen-Jen
    NUCLEIC ACIDS RESEARCH, 2009, 37 : W396 - W401
  • [29] CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition
    Guo, Jin-Cheng
    Fang, Shuang-Sang
    Wu, Yang
    Zhang, Jian-Hua
    Chen, Yang
    Liu, Jing
    Wu, Bo
    Wu, Jia-Rui
    Li, En-Min
    Xu, Li-Yan
    Sun, Liang
    Zhao, Yi
    NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W516 - W522
  • [30] SEQUENCE-BASED DEEP LEARNING MODEL LINKS NON-CODING ACTIVITY-DEPENDENT REGULATORY POTENTIAL TO EFFECTS ON EDUCATIONAL ATTAINMENT
    Bahl, Ethan
    Chatterjee, Snehajyoti
    Abel, Ted
    Michaelson, Jacob
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S128 - S129