NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods

被引:10
作者
Wu, Zhenfeng [1 ,2 ]
Liu, Weixiang [3 ]
Jin, Xiufeng [2 ]
Ji, Haishuo [2 ]
Wang, Hua [2 ]
Glusman, Gustavo [4 ]
Robinson, Max [4 ]
Liu, Lin [2 ]
Ruan, Jishou [1 ]
Gao, Shan [2 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Nankai Univ, Coll Life Sci, Tianjin, Peoples R China
[3] Shenzhen Univ, Hlth Sci Ctr, Sch Biomed Engn, Shenzhen, Peoples R China
[4] Inst Syst Biol, Washington, DC USA
基金
中国国家自然科学基金;
关键词
gene expression; normalization; evaluation; R package; scRNA-seq; DIFFERENTIAL EXPRESSION; RNA;
D O I
10.3389/fgene.2019.00400
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
引用
收藏
页数:8
相关论文
共 50 条
[31]   A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data [J].
Luo, J. ;
Schumacher, M. ;
Scherer, A. ;
Sanoudou, D. ;
Megherbi, D. ;
Davison, T. ;
Shi, T. ;
Tong, W. ;
Shi, L. ;
Hong, H. ;
Zhao, C. ;
Elloumi, F. ;
Shi, W. ;
Thomas, R. ;
Lin, S. ;
Tillinghast, G. ;
Liu, G. ;
Zhou, Y. ;
Herman, D. ;
Li, Y. ;
Deng, Y. ;
Fang, H. ;
Bushel, P. ;
Woods, M. ;
Zhang, J. .
PHARMACOGENOMICS JOURNAL, 2010, 10 (04) :278-291
[32]   Cancer Classification Using Gene Expression Data [J].
Sonsare, Pravinkumar ;
Mujumdar, Aarya ;
Joshi, Pranjali ;
Morayya, Nipun ;
Hablani, Sachal ;
Khergade, Vedant .
SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 1, SMARTCOM 2024, 2024, 945 :1-11
[33]   Review on Feature Selection Methods for Gene Expression Data Classification [J].
Almutiri, Talal ;
Saeed, Faisal .
EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 :24-34
[34]   VALIDATION OF CLASSIFICATION MODELS AND DATA REDUCTION METHODS BASED ON GENE EXPRESSION DATA [J].
Rafiee, Mohammad ;
Rafiei, Fatemeh ;
Tabatabaei, Seyyed Mohammad ;
AlaviMajd, Hamid ;
Rafiei, Ali ;
Khodakarim, Soheila .
JP JOURNAL OF BIOSTATISTICS, 2019, 16 (02) :79-90
[35]   Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data [J].
Chai, Lian En ;
Law, Chow Kuan ;
Mohamad, Mohd Saberi ;
Chong, Chuii Khim ;
Choon, Yee Wen ;
Deris, Safaai ;
Illias, Rosli Md .
MALAYSIAN JOURNAL OF MEDICAL SCIENCES, 2014, 21 (02) :20-27
[36]   Inference of gene networks using gene expression data with applications [J].
Chen, Chi-Kan .
HELIYON, 2024, 10 (05)
[37]   GasanalyzeR: advancing reproducible research using a new R package for photosynthesis data workflows [J].
Tholen, Danny .
AOB PLANTS, 2024, 16 (04)
[38]   Comparative methods for the analysis of gene-expression evolution: An example using yeast functional genomic data [J].
Oakley, TH ;
Gu, ZL ;
Abouheif, E ;
Patel, NH ;
Li, WH .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (01) :40-50
[39]   VIBE: an R-package for VIsualization of Bulk RNA Expression data for therapeutic targeting and disease stratification [J].
Khatri, Indu ;
van Asten, Saskia D. ;
Moreno, Leandro F. ;
Higgs, Brandon W. ;
Klijn, Christiaan ;
Blokzijl, Francis ;
Kolder, Iris C. R. M. .
FRONTIERS IN ONCOLOGY, 2025, 14
[40]   TREEEXP1.0: R Package for Analyzing Expression Evolution Based on RNA-Seq Data [J].
Ruan, Hang ;
Su, Zhixi ;
Gu, Xun .
JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION, 2016, 326 (07) :394-402