A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data

被引:25
作者
Chen, Moliang [1 ]
Ji, Guoli [1 ,2 ]
Fu, Hongjuan [1 ]
Lin, Qianmin [3 ]
Ye, Congting [4 ]
Ye, Wenbin [1 ]
Su, Yaru [5 ]
Wu, Xiaohui [1 ,6 ]
机构
[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China
[2] Xiamen Res Inst, Xiamen, Peoples R China
[3] Xiamen Univ, Xiangan Hosp, Xiamen, Peoples R China
[4] Xiamen Univ, Coll Environm & Ecol, Xiamen, Peoples R China
[5] Fuzhou Univ, Coll Math & Comp Sci, Fuzhou, Peoples R China
[6] Xiamen Res Inst, Natl Ctr Healthcare Big Data, Xiamen, Peoples R China
基金
中国国家自然科学基金;
关键词
alternative polyadenylation; RNA-seq; 3 ' untranslated region; benchmark; predictive modeling; 3' UNTRANSLATED REGIONS; CHANGE-POINT MODEL; GENE-EXPRESSION; MESSENGER-RNAS; POLY(A) SITES; CLEAVAGE; REVEALS; WIDESPREAD; MECHANISMS; DYNAMICS;
D O I
10.1093/bib/bbz068
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Alternative polyadenylation (APA) has been implicated to play an important role in post-transcriptional regulation by regulating mRNA abundance, stability, localization and translation, which contributes considerably to transcriptome diversity and gene expression regulation. RNA-seq has become a routine approach for transcriptome profiling, generating unprecedented data that could be used to identify and quantify APA site usage. A number of computational approaches for identifying APA sites and/or dynamic APA events from RNA-seq data have emerged in the literature, which provide valuable yet preliminary results that should be refined to yield credible guidelines for the scientific community. In this review, we provided a comprehensive overview of the status of currently available computational approaches. We also conducted objective benchmarking analysis using RNA-seq data sets from different species (human, mouse and Arabidopsis) and simulated data sets to present a systematic evaluation of 11 representative methods. Our benchmarking study showed that the overall performance of all tools investigated is moderate, reflecting that there is still lot of scope to improve the prediction of APA site or dynamic APA events from RNA-seq data. Particularly, prediction results from individual tools differ considerably, and only a limited number of predicted APA sites or genes are common among different tools. Accordingly, we attempted to give some advice on how to assess the reliability of the obtained results. We also proposed practical recommendations on the appropriate method applicable to diverse scenarios and discussed implications and future directions relevant to profiling APA from RNA-seq data.
引用
收藏
页码:1261 / 1276
页数:16
相关论文
共 92 条
  • [31] Driving glioblastoma growth by alternative polyadenylation
    Han, Ting
    Kim, John K.
    [J]. CELL RESEARCH, 2014, 24 (09) : 1023 - 1024
  • [32] PAT-seq: a method to study the integration of 3′-UTR dynamics with gene expression in the eukaryotic transcriptome
    Harrison, Paul F.
    Powell, David R.
    Clancy, Jennifer L.
    Preiss, Thomas
    Boag, Peter R.
    Traven, Ana
    Seemann, Torsten
    Beilharz, Traude H.
    [J]. RNA, 2015, 21 (08) : 1502 - 1510
  • [33] GENCODE: The reference human genome annotation for The ENCODE Project
    Harrow, Jennifer
    Frankish, Adam
    Gonzalez, Jose M.
    Tapanari, Electra
    Diekhans, Mark
    Kokocinski, Felix
    Aken, Bronwen L.
    Barrell, Daniel
    Zadissa, Amonida
    Searle, Stephen
    Barnes, If
    Bignell, Alexandra
    Boychenko, Veronika
    Hunt, Toby
    Kay, Mike
    Mukherjee, Gaurab
    Rajan, Jeena
    Despacio-Reyes, Gloria
    Saunders, Gary
    Steward, Charles
    Harte, Rachel
    Lin, Michael
    Howald, Cedric
    Tanzer, Andrea
    Derrien, Thomas
    Chrast, Jacqueline
    Walters, Nathalie
    Balasubramanian, Suganthi
    Pei, Baikang
    Tress, Michael
    Manuel Rodriguez, Jose
    Ezkurdia, Iakes
    van Baren, Jeltje
    Brent, Michael
    Haussler, David
    Kellis, Manolis
    Valencia, Alfonso
    Reymond, Alexandre
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J.
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1760 - 1774
  • [34] Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
    Hayer, Katharina E.
    Pizarro, Angel
    Lahens, Nicholas F.
    Hogenesch, John B.
    Grant, Gregory R.
    [J]. BIOINFORMATICS, 2015, 31 (24) : 3938 - 3945
  • [35] Alternative polyadenylation is involved in auxin-based plant growth and development
    Hong, Liwei
    Ye, Congting
    Lin, Juncheng
    Fu, Haihui
    Wu, Xiaohui
    Li, Qingshun Q.
    [J]. PLANT JOURNAL, 2018, 93 (02) : 246 - 258
  • [36] Hoque M, 2013, NAT METHODS, V10, P133, DOI [10.1038/NMETH.2288, 10.1038/nmeth.2288]
  • [37] BRIE: transcriptome-wide splicing quantification in single cells
    Huang, Yuanhua
    Sanguinetti, Guido
    [J]. GENOME BIOLOGY, 2017, 18
  • [38] ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
    Huang, Zixia
    Teeling, Emma C.
    [J]. BMC GENOMICS, 2017, 18
  • [39] Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs
    Jan, Calvin H.
    Friedman, Robin C.
    Ruby, J. Graham
    Bartel, David P.
    [J]. NATURE, 2011, 469 (7328) : 97 - U114
  • [40] Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes
    Ji, Guoli
    Guan, Jinting
    Zeng, Yong
    Li, Qingshun Q.
    Wu, Xiaohui
    [J]. BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) : 304 - 313