Biological Pathway Analysis for de novo Transcriptomes through Multiple Reference Species Selections

被引：3

作者：

Liu, Chun-Cheng ^{[1
]}

Chen, Chien-Ming ^{[1
]}

Yang, Cin-Han ^{[1
]}

Pai, Tun-Wen ^{[1
]}

Lim, Phaik-Eem ^{[2
]}

Phang, Siew-Moi ^{[2
,3
]}

Poong, Sze-Wan ^{[2
]}

Lee, Kok-Keong ^{[2
,4
]}

机构：

[1] Natl Taiwan Ocean Univ, Dept Comp Sci & Engn, Keelung, Taiwan

[2] Univ Malaya, Inst Ocean & Earth Sci, Kuala Lumpur, Malaysia

[3] Univ Malaya, Inst Biol Sci, Kuala Lumpur, Malaysia

[4] Univ Malaya, Inst Grad Studies, Kuala Lumpur, Malaysia

来源：

PROCEEDINGS OF 2016 10TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS (CISIS) | 2016年

关键词：

component; RNA-Seq; de novo species; multiple reference species; biological pathway;

D O I：

10.1109/CISIS.2016.73

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For de novo transcriptome analysis, choosing a closest reference model specie in terms of evolutionary distance is a general approach for gene mapping and genome annotations. However, not every selected reference model species possesses comprehensive genome annotations and curated information, and the total number of mapped genes from the selected reference species could not be fully expected either. Due to inefficient mapped genes from the selected reference model species, the following functional pathway analysis on transcriptome datasets would be seriously affected. To solve this problem, we proposed an improved approach based on multiple reference model species selection, especially for KEGG pathway analysis on differentially expressed genes. Applying union operations on individually mapped genes from different selected species, we could significantly promote the integrity of gene mapping results in KEGG pathways and provide realistic P-values for each identified pathway. Furthermore, based on mapped genes and KGML datasets, we applied various gray-levels, colors and shapes to present gene expression conditions on each biological pathway. Taking NGS transcriptomic datasets from an unknown Antarctic green alga species as an experimental example and selecting three published known species including Chlamydomonas reinhardtii, Chlorella variabilis, and Coccomyxa subellipsoidea as candidate reference species, we compared the results of pathway enrichment analysis by adopting different selections of reference species. We found that integrating all mapped genes from various model species provided a better result compared to using any single reference species. Some missed important biological pathways could be retrieved under an identical threshold setting of P-value, such as Ribosome, Pyrimidine metabolism and ABC transporters pathways. Therefore, we believe appropriate selection of multiple reference species is necessary and significant for transcriptome analysis on de novo species.

引用

页码：210 / 214

页数：5

共 13 条

[1] Next-generation DNA sequencing techniques
Ansorge, Wilhelm J.
[J]. NEW BIOTECHNOLOGY, 2009, 25 (04) : 195 - 203
[2] Trimmomatic: a flexible trimmer for Illumina sequence data
Bolger, Anthony M.
Lohse, Marc
Usadel, Bjoern
[J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120
[3] BLAST plus : architecture and applications
Camacho, Christiam
Coulouris, George
Avagyan, Vahram
Ma, Ning
Papadopoulos, Jason
Bealer, Kevin
Madden, Thomas L.
[J]. BMC BIOINFORMATICS, 2009, 10
[4] Ensembl 2015
Cunningham, Fiona
Amode, M. Ridwan
Barrell, Daniel
Beal, Kathryn
Billis, Konstantinos
Brent, Simon
Carvalho-Silva, Denise
Clapham, Peter
Coates, Guy
Fitzgerald, Stephen
Gil, Laurent
Giron, Carlos Garcia
Gordon, Leo
Hourlier, Thibaut
Hunt, Sarah E.
Janacek, Sophie H.
Johnson, Nathan
Juettemann, Thomas
Kaehaeri, Andreas K.
Keenan, Stephen
Martin, Fergal J.
Maurel, Thomas
McLaren, William
Murphy, Daniel N.
Nag, Rishi
Overduin, Bert
Parker, Anne
Patricio, Mateus
Perry, Emily
Pignatelli, Miguel
Riat, Harpreet Singh
Sheppard, Daniel
Taylor, Kieron
Thormann, Anja
Vullo, Alessandro
Wilder, Steven P.
Zadissa, Amonida
Aken, Bronwen L.
Birney, Ewan
Harrow, Jennifer
Kinsella, Rhoda
Muffato, Matthieu
Ruffier, Magali
Searle, Stephen M. J.
Spudich, Giulietta
Trevanion, Stephen J.
Yates, Andy
Zerbino, Daniel R.
Flicek, Paul
[J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D662 - D669
[5] Full-length transcriptome assembly from RNA-Seq data without a reference genome
Grabherr, Manfred G.
Haas, Brian J.
Yassour, Moran
Levin, Joshua Z.
Thompson, Dawn A.
Amit, Ido
Adiconis, Xian
Fan, Lin
Raychowdhury, Raktima
Zeng, Qiandong
Chen, Zehua
Mauceli, Evan
Hacohen, Nir
Gnirke, Andreas
Rhind, Nicholas
di Palma, Federica
Birren, Bruce W.
Nusbaum, Chad
Lindblad-Toh, Kerstin
Friedman, Nir
Regev, Aviv
[J]. NATURE BIOTECHNOLOGY, 2011, 29 (07) : 644 - U130
[6] KEGG: Kyoto Encyclopedia of Genes and Genomes
Kanehisa, M
Goto, S
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 27 - 30
[7] Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[8] Fast and accurate short read alignment with Burrows-Wheeler transform
Li, Heng
Durbin, Richard
[J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
[9] Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species
Peterson, Brant K.
Weber, Jesse N.
Kay, Emily H.
Fisher, Heidi S.
Hoekstra, Hopi E.
[J]. PLOS ONE, 2012, 7 (05):
[10] BEDTools: a flexible suite of utilities for comparing genomic features
Quinlan, Aaron R.
Hall, Ira M.
[J]. BIOINFORMATICS, 2010, 26 (06) : 841 - 842

← 1 2 →