Ambiguous genes due to aligners and their impact on RNA-seq data analysis

被引:2
|
作者
Szabelska-Beresewicz, Alicja [1 ]
Zyprych-Walczak, Joanna [1 ]
Siatkowski, Idzi [1 ]
Okoniewski, Michal [2 ]
机构
[1] Poznan Univ Life Sci, Dept Math & Stat Methods, Wojska Polskiego 28, PL-60637 Poznan, Poland
[2] Swiss Fed Inst Technol, Sci IT Serv, Weinbergstr 11, CH-8092 Zurich, Switzerland
关键词
REPRODUCIBILITY;
D O I
10.1038/s41598-023-41085-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The main scope of the study is ambiguous genes, i.e. genes whose expression is difficult to estimate from the data produced by next-generation sequencing technologies. We focused on the RNA sequencing (RNA-Seq) type of experiment performed on the Illumina platform. It is crucial to identify such genes and understand the cause of their difficulty, as these genes may be involved in some diseases. By giving misleading results, they could contribute to a misunderstanding of the cause of certain diseases, which could lead to inappropriate treatment. We thought that the ambiguous genes would be difficult to map because of their complex structure. So we looked at RNA-seq analysis using different mappers to find genes that would have different measurements from the aligners. We were able to identify such genes using a generalized linear model with two factors: mappers and groups introduced by the experiment. A large proportion of ambiguous genes are pseudogenes. High sequence similarity of pseudogenes to functional genes may indicate problems in alignment procedures. In addition, predictive analysis verified the performance of difficult genes in classification. The effectiveness of classifying samples into specific groups was compared, including the expression of difficult and not difficult genes as covariates. In almost all cases considered, ambiguous genes have less predictive power.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] A survey of best practices for RNA-seq data analysis
    Ana Conesa
    Pedro Madrigal
    Sonia Tarazona
    David Gomez-Cabrero
    Alejandra Cervera
    Andrew McPherson
    Michał Wojciech Szcześniak
    Daniel J. Gaffney
    Laura L. Elo
    Xuegong Zhang
    Ali Mortazavi
    Genome Biology, 17
  • [42] A comprehensive workflow for optimizing RNA-seq data analysis
    Jiang, Gao
    Zheng, Juan-Yu
    Ren, Shu-Ning
    Yin, Weilun
    Xia, Xinli
    Li, Yun
    Wang, Hou-Ling
    BMC GENOMICS, 2024, 25 (01):
  • [43] Oqtans: a multifunctional workbench for RNA-seq data analysis
    Vipin T Sreedharan
    Sebastian J Schultheiss
    Géraldine Jean
    André Kahles
    Regina Bohnert
    Philipp Drewe
    Pramod Mudrakarta
    Nico Görnitz
    Georg Zeller
    Gunnar Rätsch
    BMC Bioinformatics, 15
  • [44] The Impact of RNA-seq Alignment Pipeline on Detection of Differentially Expressed Genes
    Yang, Cheng
    Wu, Po-Yen
    Phan, John H.
    Wang, May D.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1376 - 1379
  • [45] A Comparative Study of RNA-Seq Aligners Reveals Novoalign's Default Setting as an Optimal Setting for the Alignment of HeLa RNA-Seq Reads
    Adum, Kristine Sandra Pey
    Arsad, Hasni
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2022, 30 (04): : 2727 - 2745
  • [46] Impact of gene annotation choice on the quantification of RNA-seq data
    David Chisanga
    Yang Liao
    Wei Shi
    BMC Bioinformatics, 23
  • [47] Robust identification of differentially expressed genes from RNA-seq data
    Shahjaman, Md
    Mollah, Md Manir Hossain
    Rahman, Md Rezanur
    Islam, S. M. Shahinul
    Mollah, Md Nurul Haque
    GENOMICS, 2020, 112 (02) : 2000 - 2010
  • [48] IAOseq: inferring abundance of overlapping genes using RNA-seq data
    Sun, Hong
    Yang, Shuang
    Tun, Liangliang
    Li, Yixue
    BMC BIOINFORMATICS, 2015, 16
  • [49] Identification of reference genes in lung cancer from RNA-seq data
    Varela, Macarena Arroyo
    Moreno, Rocio Bautista
    Munoz, Rosario Carmona
    Jimenez, Rafael Larrosa
    Rios, Jose Luis De la Cruz
    Cobo, Manuel
    Claros, M. G.
    EUROPEAN RESPIRATORY JOURNAL, 2017, 50
  • [50] Statistical methods on detecting differentially expressed genes for RNA-seq data
    Chen, Zhongxue
    Liu, Jianzhong
    Ng, Hon Keung Tony
    Nadarajah, Saralees
    Kaufman, Howard L.
    Yang, Jack Y.
    Deng, Youping
    BMC SYSTEMS BIOLOGY, 2011, 5