Detection of differentially expressed genes using feature selection approach from RNA-seq

被引:0
作者
Piao, Yongjun [1 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database & Bioinformat Lab, Cheongju, South Korea
来源
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP) | 2017年
基金
新加坡国家研究基金会;
关键词
RNA-seq; differential expression analysis; feature selection; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advance of next generation sequencing technology, RNA-seq is widely being used for transcriptomics as an alternative for microarray. RNA-seq has a dynamic range of applications such as gene expression quantification, alternative splicing identification, and novel transcript discovery. Generally, the primary aim of RNA-seq analysis is to detect differentially expressed genes in different biological conditions. From the data mining point of view, discovering differentially expressed genes can be seen as a feature selection problem that identifies most significant genes for discriminating different biological conditions. Feature selection methods for differential analysis in microarray data are well established in the literature but there are few studies on feature selection in RNA-seq experiments. In this paper, we propose to apply feature selection method in data mining for differential expression analysis. Symmetrical uncertainty is used to rank the genes and significant genes are selected based on a pre-defined relevance threshold. To evaluate the proposed method, we conducted a simulation study to assess the performance in terms of the true and false positive rates. The experimental results demonstrated that feature selection strategy can be applied for differential analysis in RNA-seq and outperformed the existing statistical approaches.
引用
收藏
页码:304 / 308
页数:5
相关论文
共 21 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [J].
Dillies, Marie-Agnes ;
Rau, Andrea ;
Aubert, Julie ;
Hennequet-Antier, Christelle ;
Jeanmougin, Marine ;
Servant, Nicolas ;
Keime, Celine ;
Marot, Guillemette ;
Castel, David ;
Estelle, Jordi ;
Guernec, Gregory ;
Jagla, Bernd ;
Jouneau, Luc ;
Laloe, Denis ;
Le Gall, Caroline ;
Schaeffer, Brigitte ;
Le Crom, Stephane ;
Guedj, Mickael ;
Jaffrezic, Florence .
BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) :671-683
[3]   baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data [J].
Hardcastle, Thomas J. ;
Kelly, Krystyna A. .
BMC BIOINFORMATICS, 2010, 11
[4]   Hybrid feature selection by combining filters and wrappers [J].
Hsu, Hui-Huang ;
Hsieh, Cheng-Wei ;
Lu, Ming-Da .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :8144-8150
[5]   Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data [J].
Li, Peipei ;
Piao, Yongjun ;
Shon, Ho Sun ;
Ryu, Keun Ho .
BMC BIOINFORMATICS, 2015, 16
[6]   LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data [J].
Lin, Bingqing ;
Zhang, Li-Feng ;
Chen, Xin .
BMC GENOMICS, 2014, 15
[7]   Ensemble gene selection for cancer classification [J].
Liu, Huawen ;
Liu, Lei ;
Zhang, Huijie .
PATTERN RECOGNITION, 2010, 43 (08) :2763-2772
[8]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Sequencing technologies - the next generation [J].
Metzker, Michael L. .
NATURE REVIEWS GENETICS, 2010, 11 (01) :31-46
[9]   Simulation of microarray data with realistic characteristics [J].
Nykter, Matti ;
Aho, Tommi ;
Ahdesmaki, Miika ;
Ruusuvuori, Pekka ;
Lehmussola, Antti ;
Yli-Harja, Olli .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]   From RNA-seq reads to differential expression results [J].
Oshlack, Alicia ;
Robinson, Mark D. ;
Young, Matthew D. .
GENOME BIOLOGY, 2010, 11 (12)