Big data analysis of human mitochondrial DNA substitution models: a regression approach

被引:3
作者
Hallak, Keren Levinstein [1 ]
Tzur, Shay [2 ]
Rosset, Saharon [1 ]
机构
[1] Tel Aviv Univ, Sch Math Sci, Dept Stat & Operat Res, IL-6997801 Tel Aviv, Israel
[2] Hebrew Univ Jerusalem, Braun Sch Publ Hlth & Community Med, IL-9112102 Jerusalem, Israel
关键词
Mitochondrial DNA; Substitution models; Regression; Partitioning; Context; MAXIMUM-LIKELIHOOD-ESTIMATION; CODON SUBSTITUTION; AMINO-ACID; PHYLOGENETIC ANALYSIS; NUCLEOTIDE; SELECTION; PARSIMONY; EVOLUTION; SEQUENCES; INFERENCE;
D O I
10.1186/s12864-018-5123-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundWe study Phylotree, a comprehensive representation of the phylogeny of global human mitochondrial DNA (mtDNA) variations, to better understand the mtDNA substitution mechanism and its most influential factors. We consider a substitution model, where a set of genetic features may predict the rate at which mtDNA substitutions occur. To find an appropriate model, an exhaustive analysis on the effect of multiple factors on the substitution rate is performed through Negative Binomial and Poisson regressions. We examine three different inclusion options for each categorical factor: omission, inclusion as an explanatory variable, and by-value partitioning. The examined factors include genes, codon position, a CpG indicator, directionality, nucleotide, amino acid, codon, and context (neighboring nucleotides), in addition to other site based factors. Partitioning a model by a factor's value results in several sub-models (one for each value), where the likelihoods of the sub-models can be combined to form a score for the entire model. Eventually, the leading models are considered as viable candidates for explaining mtDNA substitution rates.ResultsInitially, we introduce a novel clustering technique on genes, based on three similarity tests between pairs of genes, supporting previous results regarding gene functionalities in the mtDNA. These clusters are then used as a factor in our models.We present leading models for the protein coding genes, rRNA and tRNA genes and the control region, showing it is disadvantageous to separate the models of transitions/transversions, or synonymous/non-synonymous substitutions. We identify a context effect that cannot be attributed solely to protein level constraints or CpG pairs.For protein-coding genes, we show that the substitution model should be partitioned into sub-models according to the codon position and input codon; additionally we confirm that gene identity and cluster have no significant effect once the above factors are accounted for.ConclusionsWe leverage the large, high-confidence Phylotree mtDNA phylogeny to develop a new statistical approach. We model the substitution rates using regressions, allowing consideration of many factors simultaneously. This admits the use of model selection tools helping to identify the set of factors best explaining the mutational dynamics when considered in tandem.
引用
收藏
页数:13
相关论文
共 50 条
[1]   An expanded sequence context model broadly explains variability in polymorphism levels across the human genome [J].
Aggarwala, Varun ;
Voight, Benjamin F. .
NATURE GENETICS, 2016, 48 (04) :349-+
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]   SEQUENCE AND ORGANIZATION OF THE HUMAN MITOCHONDRIAL GENOME [J].
ANDERSON, S ;
BANKIER, AT ;
BARRELL, BG ;
DEBRUIJN, MHL ;
COULSON, AR ;
DROUIN, J ;
EPERON, IC ;
NIERLICH, DP ;
ROE, BA ;
SANGER, F ;
SCHREIER, PH ;
SMITH, AJH ;
STADEN, R ;
YOUNG, IG .
NATURE, 1981, 290 (5806) :457-465
[4]  
[Anonymous], 1969, EVOLUTION PROTEIN MO
[5]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkh121, 10.1093/nar/gkp985]
[6]   A "Copernican" Reassessment of the Human Mitochondrial DNA Tree from its Root [J].
Behar, Doron M. ;
van Oven, Mannis ;
Rosset, Saharon ;
Metspalu, Malt ;
Loogvali, Eva-Liis ;
Silva, Nuno M. ;
Kivisild, Toomas ;
Torroni, Antonio ;
Villems, Richard .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 90 (04) :675-684
[7]   MITOCHONDRIAL-DNA AND HUMAN-EVOLUTION [J].
CANN, RL ;
STONEKING, M ;
WILSON, AC .
NATURE, 1987, 325 (6099) :31-36
[8]  
Clayton D A, 2000, Hum Reprod, V15 Suppl 2, P11
[9]  
CZELUSNIAK J, 1990, METHOD ENZYMOL, V183, P601
[10]   Regularities of context-dependent codon bias in eukaryotic genes [J].
Fedorov, A ;
Saxonov, S ;
Gilbert, W .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1192-1197