Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach

被引:12
作者
Akond, Zobaer [1 ,2 ,4 ]
Alam, Munirul [2 ]
Mollah, Md. Nurul Haque [3 ]
机构
[1] BARI, Agr Stat & Informat & Commun Technol ASICT Div, Gazipur 1701, Bangladesh
[2] Univ Rajshahi, Inst Environm Sci, Rajshahi 6205, Bangladesh
[3] Int Ctr Diarrheal Dis Res Bangladesh, Infect Dis Div, Emerging Infect, Rajshahi, Bangladesh
[4] Univ Rajshahi, Dept Stat, Bioinformat Lab, Rajshahi 6205, Bangladesh
关键词
RNA-seq data; differentially expressed genes; robust t-statistic; gene-disease network; protein-protein interaction;
D O I
10.6026/97320630014153
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Biomarker identification by differentially expressed genes (DEGs) using RNA-sequencing technology is an important task to characterize the transcriptomics data. This is possible with the advancement of next-generation sequencing technology (NGS). There are a number of statistical techniques to identify DEGs from high-dimensional RNA-seq count data with different groups or conditions such as edgeR, SAMSeq, voom-limma, etc. However, these methods produce high false positives and low accuracy in presence of outliers. We describe a robust t-statistic method to overcome these drawbacks using both simulated and real RNA-seq datasets. The model performance with 61.2%, 35.2%, 21.6%, 6.9%, 74.5%, 78.4%, 93.1%, 35.2% sensitivity, specificity, MER, FDR, AUC, ACC, PPV, and NPV, respectively at 20% outliers is reported. We identified 409 DE genes with p-values<0.05 using robust t-test in HIV viremic vs avirmeic state real dataset. There are 28 up-regulated genes and 381 down-regulated genes estimated by log2 fold change (FC) approach at threshold value 1.5. The up-regulated genes form three clusters and it is found that 11 genes are highly associated in HIV1/AIDS. Protein-protein interaction (PPI) of up-regulated genes using STRING database found 21 genes with strong association among themselves. Thus, the identification of potential biomarkers from RNA-seq dataset using a robust t-statistical model is demonstrated.
引用
收藏
页码:153 / 163
页数:11
相关论文
共 19 条
  • [1] Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays
    Agarwal, Ashish
    Koppstein, David
    Rozowsky, Joel
    Sboner, Andrea
    Habegger, Lukas
    Hillier, LaDeana W.
    Sasidharan, Rajkumar
    Reinke, Valerie
    Waterston, Robert H.
    Gerstein, Mark
    [J]. BMC GENOMICS, 2010, 11
  • [2] Ander S, 2010, GENOME BIOL, V11, P94
  • [3] A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling
    Bradford, James R.
    Hey, Yvonne
    Yates, Tim
    Li, Yaoyong
    Pepper, Stuart D.
    Miller, Crispin J.
    [J]. BMC GENOMICS, 2010, 11
  • [4] David S, BIOMARKERS TOXICOLOG
  • [5] Fold change rank ordering statistics: a new method for detecting differentially expressed genes
    Dembele, Doulaye
    Kastner, Philippe
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [6] A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
    Dillies, Marie-Agnes
    Rau, Andrea
    Aubert, Julie
    Hennequet-Antier, Christelle
    Jeanmougin, Marine
    Servant, Nicolas
    Keime, Celine
    Marot, Guillemette
    Castel, David
    Estelle, Jordi
    Guernec, Gregory
    Jagla, Bernd
    Jouneau, Luc
    Laloe, Denis
    Le Gall, Caroline
    Schaeffer, Brigitte
    Le Crom, Stephane
    Guedj, Mickael
    Jaffrezic, Florence
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) : 671 - 683
  • [7] Dispersion Estimation and Its Effect on Test Performance in RNA-seq Data Analysis: A Simulation-Based Comparison of Methods
    Landau, William Michael
    Liu, Peng
    [J]. PLOS ONE, 2013, 8 (12):
  • [8] Li J, 2013, STAT METHODS MED RES
  • [9] Robust extraction of local structures by the minimum β-divergence method
    Mollah, Md Nurul Haque
    Sultana, Nayeema
    Minami, Mihoko
    Eguchi, Shinto
    [J]. NEURAL NETWORKS, 2010, 23 (02) : 226 - 238
  • [10] From RNA-seq reads to differential expression results
    Oshlack, Alicia
    Robinson, Mark D.
    Young, Matthew D.
    [J]. GENOME BIOLOGY, 2010, 11 (12):