On the effective depth of viral sequence data

被引：28

作者：

Illingworth, Christopher J. R. ^{[1
,2
]}

Roy, Sunando ^{[3
]}

Beale, Mathew A. ^{[4
]}

Tutill, Helena ^{[3
]}

Williams, Rachel ^{[3
]}

Breuer, Judith ^{[3
]}

机构：

[1] Univ Cambridge, Dept Genet, Cambridge, England

[2] Univ Cambridge, Ctr Math Sci, Dept Appl Maths & Theoret Phys, Cambridge, England

[3] UCL, Div Infect & Immun, London, England

[4] Wellcome Trust Sanger Inst, Cambridge, England

来源：

VIRUS EVOLUTION | 2017年 / 3卷 / 02期

关键词：

population genetics; sequence data; evolutionary modelling; DIVERSITY; EVOLUTION; ERRORS; VIRUS; TRANSMISSION; POPULATIONS; ADAPTATION; TISSUES;

D O I：

10.1093/ve/vex030

中图分类号：

Q93 [微生物学];

学科分类号：

071005 ; 100705 ;

摘要：

Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.

引用

页数：9

共 56 条

[1] Mutational and fitness landscapes of an RNA virus revealed through population sequencing [J].