Biological Pathway Analysis for de novo Transcriptomes through Multiple Reference Species Selections

被引:3
作者
Liu, Chun-Cheng [1 ]
Chen, Chien-Ming [1 ]
Yang, Cin-Han [1 ]
Pai, Tun-Wen [1 ]
Lim, Phaik-Eem [2 ]
Phang, Siew-Moi [2 ,3 ]
Poong, Sze-Wan [2 ]
Lee, Kok-Keong [2 ,4 ]
机构
[1] Natl Taiwan Ocean Univ, Dept Comp Sci & Engn, Keelung, Taiwan
[2] Univ Malaya, Inst Ocean & Earth Sci, Kuala Lumpur, Malaysia
[3] Univ Malaya, Inst Biol Sci, Kuala Lumpur, Malaysia
[4] Univ Malaya, Inst Grad Studies, Kuala Lumpur, Malaysia
来源
PROCEEDINGS OF 2016 10TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS (CISIS) | 2016年
关键词
component; RNA-Seq; de novo species; multiple reference species; biological pathway;
D O I
10.1109/CISIS.2016.73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For de novo transcriptome analysis, choosing a closest reference model specie in terms of evolutionary distance is a general approach for gene mapping and genome annotations. However, not every selected reference model species possesses comprehensive genome annotations and curated information, and the total number of mapped genes from the selected reference species could not be fully expected either. Due to inefficient mapped genes from the selected reference model species, the following functional pathway analysis on transcriptome datasets would be seriously affected. To solve this problem, we proposed an improved approach based on multiple reference model species selection, especially for KEGG pathway analysis on differentially expressed genes. Applying union operations on individually mapped genes from different selected species, we could significantly promote the integrity of gene mapping results in KEGG pathways and provide realistic P-values for each identified pathway. Furthermore, based on mapped genes and KGML datasets, we applied various gray-levels, colors and shapes to present gene expression conditions on each biological pathway. Taking NGS transcriptomic datasets from an unknown Antarctic green alga species as an experimental example and selecting three published known species including Chlamydomonas reinhardtii, Chlorella variabilis, and Coccomyxa subellipsoidea as candidate reference species, we compared the results of pathway enrichment analysis by adopting different selections of reference species. We found that integrating all mapped genes from various model species provided a better result compared to using any single reference species. Some missed important biological pathways could be retrieved under an identical threshold setting of P-value, such as Ribosome, Pyrimidine metabolism and ABC transporters pathways. Therefore, we believe appropriate selection of multiple reference species is necessary and significant for transcriptome analysis on de novo species.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 13 条
  • [1] Next-generation DNA sequencing techniques
    Ansorge, Wilhelm J.
    [J]. NEW BIOTECHNOLOGY, 2009, 25 (04) : 195 - 203
  • [2] Trimmomatic: a flexible trimmer for Illumina sequence data
    Bolger, Anthony M.
    Lohse, Marc
    Usadel, Bjoern
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120
  • [3] BLAST plus : architecture and applications
    Camacho, Christiam
    Coulouris, George
    Avagyan, Vahram
    Ma, Ning
    Papadopoulos, Jason
    Bealer, Kevin
    Madden, Thomas L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [4] Ensembl 2015
    Cunningham, Fiona
    Amode, M. Ridwan
    Barrell, Daniel
    Beal, Kathryn
    Billis, Konstantinos
    Brent, Simon
    Carvalho-Silva, Denise
    Clapham, Peter
    Coates, Guy
    Fitzgerald, Stephen
    Gil, Laurent
    Giron, Carlos Garcia
    Gordon, Leo
    Hourlier, Thibaut
    Hunt, Sarah E.
    Janacek, Sophie H.
    Johnson, Nathan
    Juettemann, Thomas
    Kaehaeri, Andreas K.
    Keenan, Stephen
    Martin, Fergal J.
    Maurel, Thomas
    McLaren, William
    Murphy, Daniel N.
    Nag, Rishi
    Overduin, Bert
    Parker, Anne
    Patricio, Mateus
    Perry, Emily
    Pignatelli, Miguel
    Riat, Harpreet Singh
    Sheppard, Daniel
    Taylor, Kieron
    Thormann, Anja
    Vullo, Alessandro
    Wilder, Steven P.
    Zadissa, Amonida
    Aken, Bronwen L.
    Birney, Ewan
    Harrow, Jennifer
    Kinsella, Rhoda
    Muffato, Matthieu
    Ruffier, Magali
    Searle, Stephen M. J.
    Spudich, Giulietta
    Trevanion, Stephen J.
    Yates, Andy
    Zerbino, Daniel R.
    Flicek, Paul
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D662 - D669
  • [5] Full-length transcriptome assembly from RNA-Seq data without a reference genome
    Grabherr, Manfred G.
    Haas, Brian J.
    Yassour, Moran
    Levin, Joshua Z.
    Thompson, Dawn A.
    Amit, Ido
    Adiconis, Xian
    Fan, Lin
    Raychowdhury, Raktima
    Zeng, Qiandong
    Chen, Zehua
    Mauceli, Evan
    Hacohen, Nir
    Gnirke, Andreas
    Rhind, Nicholas
    di Palma, Federica
    Birren, Bruce W.
    Nusbaum, Chad
    Lindblad-Toh, Kerstin
    Friedman, Nir
    Regev, Aviv
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (07) : 644 - U130
  • [6] KEGG: Kyoto Encyclopedia of Genes and Genomes
    Kanehisa, M
    Goto, S
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 27 - 30
  • [7] Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
  • [8] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
  • [9] Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species
    Peterson, Brant K.
    Weber, Jesse N.
    Kay, Emily H.
    Fisher, Heidi S.
    Hoekstra, Hopi E.
    [J]. PLOS ONE, 2012, 7 (05):
  • [10] BEDTools: a flexible suite of utilities for comparing genomic features
    Quinlan, Aaron R.
    Hall, Ira M.
    [J]. BIOINFORMATICS, 2010, 26 (06) : 841 - 842