Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

被引：18

作者：

He, Wenying ^{[1
]}

Ju, Ying ^{[2
]}

Zeng, Xiangxiang ^{[2
]}

Liu, Xiangrong ^{[2
]}

Zou, Quan ^{[1
,3
]}

机构：

[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China

[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China

来源：

FRONTIERS IN MICROBIOLOGY | 2018年 / 9卷

基金：

中国国家自然科学基金;

关键词：

non-coding DNA; DNA sequence; feature representation; genome synthesis; support vector machine; TERT PROMOTER MUTATIONS; PHYSICOCHEMICAL PROPERTIES; FEATURE-EXTRACTION; FEATURE-SELECTION; WEB SERVERS; PROTEIN; SITES; INFORMATION; RECURRENT; GENOME;

D O I：

10.3389/fmicb.2018.02174

中图分类号：

Q93 [微生物学];

学科分类号：

071005 ; 100705 ;

摘要：

With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.

引用

页数：9

共 95 条

[1] BUILDING AN ASSOCIATIVE MEMORY VASTLY LARGER THAN THE BRAIN
BAUM, EB
[J]. SCIENCE, 1995, 268 (5210) : 583 - 585
[2] Bishop Christopher M, 2016, Pattern recognition and machine learning
[3] Byun H, 2002, LECT NOTES COMPUT SC, V2388, P213
[4] SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines
Cao, Renzhi
Wang, Zheng
Wang, Yiheng
Cheng, Jianlin
[J]. BMC BIOINFORMATICS, 2014, 15
[5] Genome engineering
Carr, Peter A.
Church, George M.
[J]. NATURE BIOTECHNOLOGY, 2009, 27 (12) : 1151 - 1162
[6] LIBSVM: A Library for Support Vector Machines
Chang, Chih-Chung
Lin, Chih-Jen
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[7] IACP: a sequence-based tool for identifying anticancer peptides
Chen, Wei
Ding, Hui
Feng, Pengmian
Lin, Hao
Chou, Kuo-Chen
[J]. ONCOTARGET, 2016, 7 (13) : 16895 - 16909
[8] Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition
Chen, Xin-Xin
Tang, Hua
Li, Wen-Chao
Wu, Hao
Chen, Wei
Ding, Hui
Lin, Hao
[J]. BIOMED RESEARCH INTERNATIONAL, 2016, 2016
[9] MicroRNAs and complex diseases: from experimental results to computational models
Chen, Xing
Xie, Di
Zhao, Qi
You, Zhu-Hong
[J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (02) : 515 - 539
[10] Long non-coding RNAs and complex diseases: from experimental results to computational models
Chen, Xing
Yan, Chenggang Clarence
Zhang, Xu
You, Zhu-Hong
[J]. BRIEFINGS IN BIOINFORMATICS, 2017, 18 (04) : 558 - 576

← 1 2 3 4 5 6 7 8 9 10 →