BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm

被引:17
作者
Papiez, Anna [1 ]
Marczyk, Michel [1 ,2 ]
Polanska, Joanna [1 ]
Polanski, Andrzej [3 ]
机构
[1] Silesian Tech Univ, Inst Automat Control, PL-44100 Gliwice, Poland
[2] Yale Univ, Yale Sch Med, Dept Internal Med, New Haven, CT 06510 USA
[3] Silesian Tech Univ, Inst Informat, PL-44100 Gliwice, Poland
关键词
GENE-EXPRESSION; CLASSIFICATION; ASSOCIATION; CANCER;
D O I
10.1093/bioinformatics/bty900
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In contemporary biological experiments, bias, which interferes with the measurements, requires attentive processing. Important sources of bias in high-throughput biological experiments are batch effects and diverse methods towards removal of batch effects have been established. These include various normalization techniques, yet many require knowledge on the number of batches and assignment of samples to batches. Only few can deal with the problem of identification of batch effect of unknown structure. For this reason, an original batch identification algorithm through dynamical programming is introduced for omics data that may be sorted on a timescale. Results: BatchI algorithm is based on partitioning a series of high-throughput experiment samples into sub-series corresponding to estimated batches. The dynamic programming method is used for splitting data with maximal dispersion between batches, while maintaining minimal within batch dispersion. The procedure has been tested on a number of available datasets with and without prior information about batch partitioning. Datasets with a priori identified batches have been split accordingly, measured with weighted average Dice Index. Batch effect correction is justified by higher intra-group correlation. In the blank datasets, identified batch divisions lead to improvement of parameters and quality of biological information, shown by literature study and Information Content. The outcome of the algorithm serves as a starting point for correction methods. It has been demonstrated that omitting the essential step of batch effect control may lead to waste of valuable potential discoveries.
引用
收藏
页码:1885 / 1892
页数:8
相关论文
共 36 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]   Statistical Design and Analysis of RNA Sequencing Data [J].
Auer, Paul L. ;
Doerge, R. W. .
GENETICS, 2010, 185 (02) :405-U32
[3]   ON THE APPROXIMATION OF CURVES BY LINE SEGMENTS USING DYNAMIC PROGRAMMING [J].
BELLMAN, R .
COMMUNICATIONS OF THE ACM, 1961, 4 (06) :284-284
[4]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[5]   Orthogonal projections to latent structures as a strategy for microarray data normalization [J].
Bylesjo, Max ;
Eriksson, Daniel ;
Sjodin, Andreas ;
Jansson, Stefan ;
Moritz, Thomas ;
Trygg, Johan .
BMC BIOINFORMATICS, 2007, 8 (1)
[6]   MEASURES OF THE AMOUNT OF ECOLOGIC ASSOCIATION BETWEEN SPECIES [J].
DICE, LR .
ECOLOGY, 1945, 26 (03) :297-302
[7]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[8]   Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012 [J].
Ferlay, Jacques ;
Soerjomataram, Isabelle ;
Dikshit, Rajesh ;
Eser, Sultan ;
Mathers, Colin ;
Rebelo, Marise ;
Parkin, Donald Maxwell ;
Forman, David ;
Bray, Freddie .
INTERNATIONAL JOURNAL OF CANCER, 2015, 136 (05) :E359-E386
[9]   Using control genes to correct for unwanted variation in microarray data [J].
Gagnon-Bartsch, Johann A. ;
Speed, Terence P. .
BIOSTATISTICS, 2012, 13 (03) :539-552
[10]   Inflammation, adenoma and cancer:: Objective classification of colon biopsy specimens with gene expression signature [J].
Galamb, Orsolya ;
Gyoerffy, Balazs ;
Sipos, Ferenc ;
Spisaka, Sandor ;
Nemetha, Anna Maria ;
Mihellera, Pal ;
Tulassay, Zsolt ;
Dinya, Elek ;
Molnar, Bela .
DISEASE MARKERS, 2008, 25 (01) :1-16