Ambiguous genes due to aligners and their impact on RNA-seq data analysis

被引:2
|
作者
Szabelska-Beresewicz, Alicja [1 ]
Zyprych-Walczak, Joanna [1 ]
Siatkowski, Idzi [1 ]
Okoniewski, Michal [2 ]
机构
[1] Poznan Univ Life Sci, Dept Math & Stat Methods, Wojska Polskiego 28, PL-60637 Poznan, Poland
[2] Swiss Fed Inst Technol, Sci IT Serv, Weinbergstr 11, CH-8092 Zurich, Switzerland
关键词
REPRODUCIBILITY;
D O I
10.1038/s41598-023-41085-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The main scope of the study is ambiguous genes, i.e. genes whose expression is difficult to estimate from the data produced by next-generation sequencing technologies. We focused on the RNA sequencing (RNA-Seq) type of experiment performed on the Illumina platform. It is crucial to identify such genes and understand the cause of their difficulty, as these genes may be involved in some diseases. By giving misleading results, they could contribute to a misunderstanding of the cause of certain diseases, which could lead to inappropriate treatment. We thought that the ambiguous genes would be difficult to map because of their complex structure. So we looked at RNA-seq analysis using different mappers to find genes that would have different measurements from the aligners. We were able to identify such genes using a generalized linear model with two factors: mappers and groups introduced by the experiment. A large proportion of ambiguous genes are pseudogenes. High sequence similarity of pseudogenes to functional genes may indicate problems in alignment procedures. In addition, predictive analysis verified the performance of difficult genes in classification. The effectiveness of classifying samples into specific groups was compared, including the expression of difficult and not difficult genes as covariates. In almost all cases considered, ambiguous genes have less predictive power.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Ambiguous genes due to aligners and their impact on RNA-seq data analysis
    Alicja Szabelska-Beresewicz
    Joanna Zyprych-Walczak
    Idzi Siatkowski
    Michał Okoniewski
    Scientific Reports, 13
  • [2] The Impact of Normalization Methods on RNA-Seq Data Analysis
    Zyprych-Walczak, J.
    Szabelska, A.
    Handschuh, L.
    Gorczak, K.
    Klamecka, K.
    Figlerowicz, M.
    Siatkowski, I.
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [3] CADBURE: A generic tool to evaluate the performance of spliced aligners on RNA-Seq data
    Praveen Kumar Raj Kumar
    Thanh V. Hoang
    Michael L. Robinson
    Panagiotis A. Tsonis
    Chun Liang
    Scientific Reports, 5
  • [4] CADBURE: A generic tool to evaluate the performance of spliced aligners on RNA-Seq data
    Kumar, Praveen Kumar Raj
    Hoang, Thanh V.
    Robinson, Michael L.
    Tsonis, Panagiotis A.
    Liang, Chun
    SCIENTIFIC REPORTS, 2015, 5
  • [5] Analysis of clustered RNA-seq data
    Park, Hyunjin
    Lee, Seungyeoun
    Kim, Ye Jin
    Choi, Myung-Sook
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 19 (01) : 19 - 31
  • [6] Normalization of RNA-seq data using factor analysis of control genes or samples
    Davide Risso
    John Ngai
    Terence P Speed
    Sandrine Dudoit
    Nature Biotechnology, 2014, 32 : 896 - 902
  • [7] Bioinformatics analysis of RNA-seq data revealed critical genes in colon adenocarcinoma
    Xi, W-D
    Liu, Y-J
    Sun, X-B
    Shan, J.
    Yi, L.
    Zhang, T-T
    EUROPEAN REVIEW FOR MEDICAL AND PHARMACOLOGICAL SCIENCES, 2017, 21 (13) : 3012 - 3020
  • [8] Normalization of RNA-seq data using factor analysis of control genes or samples
    Risso, Davide
    Ngai, John
    Speed, Terence P.
    Dudoit, Sandrine
    NATURE BIOTECHNOLOGY, 2014, 32 (09) : 896 - 902
  • [9] Bioinformatic analysis of RNA-seq data unveiled critical genes in rectal adenocarcinoma
    Zuo, Z. -G.
    Zhang, X. -F.
    Ye, X. -Z.
    Zhou, Z. -H.
    Wu, X. -B.
    Ni, S. -C.
    Song, H. -Y.
    EUROPEAN REVIEW FOR MEDICAL AND PHARMACOLOGICAL SCIENCES, 2016, 20 (14) : 3017 - 3025
  • [10] Simulation-based comprehensive benchmarking of RNA-seq aligners
    Baruzzo G.
    Hayer K.E.
    Kim E.J.
    DI Camillo B.
    Fitzgerald G.A.
    Grant G.R.
    Nature Methods, 2017, 14 (2) : 135 - 139