A note on efficient computation of haplotypes via perfect phylogeny

被引：23

作者：

Bafna, V ^{[1
]}

Gusfield, D

Hannenhalli, S

Yooseph, S

机构：

[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA

[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA

[3] Univ Penn, Dept Genet, Philadelphia, PA 19104 USA

[4] J Craig Venter Inst, Rockville, MD 20850 USA

来源：

JOURNAL OF COMPUTATIONAL BIOLOGY | 2004年 / 11卷 / 05期

关键词：

haplotype; genotype; perfect phylogeny; matrix multiplication;

D O I：

10.1089/cmb.2004.11.858

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare ( Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta( nm(2)) solutions that use simpler techniques ( Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. ( 2003) and Eskin et al. (2003) be sped-up to O(nm + m(2)) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step ( in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega less than or equal to 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).

引用

页码：858 / 866

页数：9

共 24 条

[1]

[Anonymous], RECOMB ANN INT C RES

[2]

[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness

[3]

[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931

[4] Haplotyping as perfect phylogeny: A direct approach [J].

Bafna, V ;

Gusfield, D ;

Lancia, G ;

Yooseph, S .

JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :323-340

[5]

Chung RH, 2003, LECT NOTES COMPUT SC, V2697, P5

[6] Perfect phylogeny haplotyper: haplotype inferral using a tree model [J].

Chung, RH ;

Gusfield, D .

BIOINFORMATICS, 2003, 19 (06) :780-781

[7]

CLARK AG, 1990, MOL BIOL EVOL, V7, P111

[8] Faculty uninformed in the information age? [J].

Copp, LA .

JOURNAL OF PROFESSIONAL NURSING, 1997, 13 (01) :1-2

[9] High-resolution haplotype structure in the human genome [J].

Daly, MJ ;

Rioux, JD ;

Schaffner, SE ;

Hudson, TJ ;

Lander, ES .

NATURE GENETICS, 2001, 29 (02) :229-232

[10]

EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921

← 1 2 3 →