Ambiguous genes due to aligners and their impact on RNA-seq data analysis

被引:2
|
作者
Szabelska-Beresewicz, Alicja [1 ]
Zyprych-Walczak, Joanna [1 ]
Siatkowski, Idzi [1 ]
Okoniewski, Michal [2 ]
机构
[1] Poznan Univ Life Sci, Dept Math & Stat Methods, Wojska Polskiego 28, PL-60637 Poznan, Poland
[2] Swiss Fed Inst Technol, Sci IT Serv, Weinbergstr 11, CH-8092 Zurich, Switzerland
关键词
REPRODUCIBILITY;
D O I
10.1038/s41598-023-41085-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The main scope of the study is ambiguous genes, i.e. genes whose expression is difficult to estimate from the data produced by next-generation sequencing technologies. We focused on the RNA sequencing (RNA-Seq) type of experiment performed on the Illumina platform. It is crucial to identify such genes and understand the cause of their difficulty, as these genes may be involved in some diseases. By giving misleading results, they could contribute to a misunderstanding of the cause of certain diseases, which could lead to inappropriate treatment. We thought that the ambiguous genes would be difficult to map because of their complex structure. So we looked at RNA-seq analysis using different mappers to find genes that would have different measurements from the aligners. We were able to identify such genes using a generalized linear model with two factors: mappers and groups introduced by the experiment. A large proportion of ambiguous genes are pseudogenes. High sequence similarity of pseudogenes to functional genes may indicate problems in alignment procedures. In addition, predictive analysis verified the performance of difficult genes in classification. The effectiveness of classifying samples into specific groups was compared, including the expression of difficult and not difficult genes as covariates. In almost all cases considered, ambiguous genes have less predictive power.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] PUseqClust: A Clustering Analysis Method for RNA-Seq Data
    Shi X.-F.
    Liu X.-J.
    Zhang L.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (09): : 2857 - 2868
  • [32] Impact of gene annotation choice on the quantification of RNA-seq data
    Chisanga, David
    Liao, Yang
    Shi, Wei
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [33] Intron Retention as a Mode for RNA-Seq Data Analysis
    Zheng, Jian-Tao
    Lin, Cui-Xiang
    Fang, Zhao-Yu
    Li, Hong-Dong
    FRONTIERS IN GENETICS, 2020, 11
  • [34] Getting the most out of RNA-seq data analysis
    Khang, Tsung Fei
    Lau, Ching Yee
    PEERJ, 2015, 3
  • [35] De novo assembly and analysis of RNA-seq data
    Robertson, Gordon
    Schein, Jacqueline
    Chiu, Readman
    Corbett, Richard
    Field, Matthew
    Jackman, Shaun D.
    Mungall, Karen
    Lee, Sam
    Okada, Hisanaga Mark
    Qian, Jenny Q.
    Griffith, Malachi
    Raymond, Anthony
    Thiessen, Nina
    Cezard, Timothee
    Butterfield, Yaron S.
    Newsome, Richard
    Chan, Simon K.
    She, Rong
    Varhol, Richard
    Kamoh, Baljit
    Prabhu, Anna-Liisa
    Tam, Angela
    Zhao, YongJun
    Moore, Richard A.
    Hirst, Martin
    Marra, Marco A.
    Jones, Steven J. M.
    Hoodless, Pamela A.
    Birol, Inanc
    NATURE METHODS, 2010, 7 (11) : 909 - U62
  • [36] A survey of best practices for RNA-seq data analysis
    Conesa, Ana
    Madrigal, Pedro
    Tarazona, Sonia
    Gomez-Cabrero, David
    Cervera, Alejandra
    McPherson, Andrew
    Szczesniak, Michal Wojciech
    Gaffney, Daniel J.
    Elo, Laura L.
    Zhang, Xuegong
    Mortazavi, Ali
    GENOME BIOLOGY, 2016, 17
  • [37] De novo assembly and analysis of RNA-seq data
    Gordon Robertson
    Jacqueline Schein
    Readman Chiu
    Richard Corbett
    Matthew Field
    Shaun D Jackman
    Karen Mungall
    Sam Lee
    Hisanaga Mark Okada
    Jenny Q Qian
    Malachi Griffith
    Anthony Raymond
    Nina Thiessen
    Timothee Cezard
    Yaron S Butterfield
    Richard Newsome
    Simon K Chan
    Rong She
    Richard Varhol
    Baljit Kamoh
    Anna-Liisa Prabhu
    Angela Tam
    YongJun Zhao
    Richard A Moore
    Martin Hirst
    Marco A Marra
    Steven J M Jones
    Pamela A Hoodless
    Inanc Birol
    Nature Methods, 2010, 7 : 909 - 912
  • [38] shortran: a pipeline for small RNA-seq data analysis
    Gupta, Vikas
    Markmann, Katharina
    Pedersen, Christian N. S.
    Stougaard, Jens
    Andersen, Stig U.
    BIOINFORMATICS, 2012, 28 (20) : 2698 - 2700
  • [39] sRNAflow: A Tool for the Analysis of Small RNA-Seq Data
    Zayakin, Pawel
    NON-CODING RNA, 2024, 10 (01)
  • [40] Differential expression analysis for paired RNA-seq data
    Lisa M Chung
    John P Ferguson
    Wei Zheng
    Feng Qian
    Vincent Bruno
    Ruth R Montgomery
    Hongyu Zhao
    BMC Bioinformatics, 14